QUANTILE_REGRESSION

Overview

The QUANTILE_REGRESSION function fits a quantile regression model to estimate conditional quantiles of a response variable given predictor variables. Unlike ordinary least squares (OLS) regression, which estimates the conditional mean, quantile regression estimates specific percentiles (such as the median or any other quantile) of the response distribution. This makes it particularly valuable for understanding relationships across the entire distribution of outcomes, not just the average.

Quantile regression was introduced by Roger Koenker in 1978 and has become a standard tool in econometrics, ecology, and medical research. The method is especially useful when the relationship between variables differs at different points of the outcome distribution, or when dealing with heteroscedastic data where variance changes across observations. For a comprehensive treatment, see Koenker’s foundational text Quantile Regression (Cambridge University Press, 2005).

This implementation uses the statsmodels library’s QuantReg class, which solves the quantile regression problem using iterative reweighted least squares. For details, see the statsmodels QuantReg documentation.

The quantile regression estimator minimizes an asymmetric loss function called the check function (or pinball loss):

\hat{\beta}_\tau = \arg\min_{\beta} \sum_{i=1}^{n} \rho_\tau(y_i - x_i'\beta)

where the loss function \rho_\tau(u) is defined as:

\rho_\tau(u) = u \cdot (\tau - \mathbf{1}_{u < 0}) = \begin{cases} \tau \cdot u & \text{if } u \geq 0 \\ (\tau - 1) \cdot u & \text{if } u < 0 \end{cases}

For \tau = 0.5, the check function reduces to the absolute value (up to a constant), making median regression equivalent to Least Absolute Deviations (LAD) regression. Key advantages of quantile regression include robustness to outliers in the response variable and the ability to characterize heterogeneous effects across the outcome distribution.

The function returns coefficients, standard errors, t-statistics, p-values, confidence intervals, and a pseudo R-squared measure of fit. The pseudo R-squared compares the minimized loss under the full model to an intercept-only model. For more theoretical background, see Quantile regression on Wikipedia.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=QUANTILE_REGRESSION(y, x, quantile, fit_intercept)
  • y (list[list], required): Dependent variable (response) as a column vector of numeric values.
  • x (list[list], required): Independent variables (predictors) as a matrix. Each column is a predictor; rows must match y.
  • quantile (float, optional, default: 0.5): Quantile level to estimate, between 0 and 1 (exclusive).
  • fit_intercept (bool, optional, default: true): If true, adds an intercept term to the model.

Returns (list[list]): 2D list with quantile regression results, or error string.

Examples

Example 1: Demo case 1

Inputs:

y x
1 1
2 2
3 3
4 4
5 5

Excel formula:

=QUANTILE_REGRESSION({1;2;3;4;5}, {1;2;3;4;5})

Expected output:

"non-error"

Example 2: Demo case 2

Inputs:

y x quantile
2.3 1 0.25
4.1 2
5.8 3
7.2 4
9.5 5
11.1 6

Excel formula:

=QUANTILE_REGRESSION({2.3;4.1;5.8;7.2;9.5;11.1}, {1;2;3;4;5;6}, 0.25)

Expected output:

"non-error"

Example 3: Demo case 3

Inputs:

y x fit_intercept
1.5 1 false
3.2 2
4.8 3
6.1 4
7.9 5

Excel formula:

=QUANTILE_REGRESSION({1.5;3.2;4.8;6.1;7.9}, {1;2;3;4;5}, FALSE)

Expected output:

"non-error"

Example 4: Demo case 4

Inputs:

y x quantile fit_intercept
3.1 1 0.75 true
5.8 2
8.2 3
10.5 4
13.1 5
15.8 6

Excel formula:

=QUANTILE_REGRESSION({3.1;5.8;8.2;10.5;13.1;15.8}, {1;2;3;4;5;6}, 0.75, TRUE)

Expected output:

parameter coefficient std_error t_statistic p_value ci_lower ci_upper
intercept 0.8
x1 2.5
pseudo_r_squared 0.9766

Python Code

import math
from statsmodels.regression.quantile_regression import QuantReg as sm_quantreg

def quantile_regression(y, x, quantile=0.5, fit_intercept=True):
    """
    Fits a quantile regression model to estimate conditional quantiles of the response distribution.

    See: https://www.statsmodels.org/stable/generated/statsmodels.regression.quantile_regression.QuantReg.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        y (list[list]): Dependent variable (response) as a column vector of numeric values.
        x (list[list]): Independent variables (predictors) as a matrix. Each column is a predictor; rows must match y.
        quantile (float, optional): Quantile level to estimate, between 0 and 1 (exclusive). Default is 0.5.
        fit_intercept (bool, optional): If true, adds an intercept term to the model. Default is True.

    Returns:
        list[list]: 2D list with quantile regression results, or error string.
    """
    def to2d(val):
        return [[val]] if not isinstance(val, list) else val

    def validate_2d_numeric(arr, name):
        if not isinstance(arr, list):
            return f"Invalid input: {name} must be a 2D list."
        if not arr:
            return f"Invalid input: {name} cannot be empty."
        for i, row in enumerate(arr):
            if not isinstance(row, list):
                return f"Invalid input: {name} must be a 2D list."
            if not row:
                return f"Invalid input: {name} cannot contain empty rows."
            for j, val in enumerate(row):
                if not isinstance(val, (int, float)):
                    return f"Invalid input: {name}[{i}][{j}] must be numeric."
                if math.isnan(val) or math.isinf(val):
                    return f"Invalid input: {name}[{i}][{j}] must be finite."
        return None

    # Normalize inputs to 2D lists
    y = to2d(y)
    x = to2d(x)

    # Validate inputs
    err = validate_2d_numeric(y, "y")
    if err:
        return err
    err = validate_2d_numeric(x, "x")
    if err:
        return err

    # Validate quantile
    if not isinstance(quantile, (int, float)):
        return "Invalid input: quantile must be numeric."
    if math.isnan(quantile) or math.isinf(quantile):
        return "Invalid input: quantile must be finite."
    if quantile <= 0 or quantile >= 1:
        return "Invalid input: quantile must be between 0 and 1 (exclusive)."

    # Validate fit_intercept
    if not isinstance(fit_intercept, bool):
        return "Invalid input: fit_intercept must be a boolean."

    # Extract y as 1D array
    y_flat = [row[0] for row in y]
    n_obs = len(y_flat)

    # Check x dimensions
    n_rows_x = len(x)
    if n_rows_x != n_obs:
        return f"Invalid input: x must have same number of rows as y ({n_obs})."

    # Extract x as 2D array
    n_cols_x = len(x[0])
    for row in x:
        if len(row) != n_cols_x:
            return "Invalid input: x must have consistent column count."

    # Add intercept column if requested
    if fit_intercept:
        x_data = [[1.0] + list(row) for row in x]
    else:
        x_data = [list(row) for row in x]

    # Fit quantile regression model
    try:
        model = sm_quantreg(y_flat, x_data)
        result = model.fit(q=quantile)
    except Exception as exc:
        return f"statsmodels error: {exc}"

    # Extract results
    try:
        params = result.params
        bse = result.bse
        tvalues = result.tvalues
        pvalues = result.pvalues
        conf_int = result.conf_int()
        prsquared = result.prsquared
    except Exception as exc:
        return f"Error extracting results: {exc}"

    # Build output table
    headers = ['parameter', 'coefficient', 'std_error', 't_statistic', 'p_value', 'ci_lower', 'ci_upper']
    output = [headers]

    # Add parameter rows
    for i in range(len(params)):
        if fit_intercept and i == 0:
            param_name = 'intercept'
        else:
            predictor_idx = i if not fit_intercept else i - 1
            param_name = f'x{predictor_idx + 1}'

        # Convert values to float or None if NaN
        def safe_float(val):
            fval = float(val)
            if math.isnan(fval):
                return None
            if math.isinf(fval):
                return f"Error: infinite value in result for {param_name}."
            return fval

        coef = safe_float(params[i])
        if isinstance(coef, str):
            return coef
        se = safe_float(bse[i])
        if isinstance(se, str):
            return se
        tval = safe_float(tvalues[i])
        if isinstance(tval, str):
            return tval
        pval = safe_float(pvalues[i])
        if isinstance(pval, str):
            return pval
        ci_low = safe_float(conf_int[i, 0])
        if isinstance(ci_low, str):
            return ci_low
        ci_high = safe_float(conf_int[i, 1])
        if isinstance(ci_high, str):
            return ci_high

        row = [param_name, coef, se, tval, pval, ci_low, ci_high]
        output.append(row)

    # Add pseudo R-squared row
    prsq_val = float(prsquared)
    if math.isinf(prsq_val):
        return "Error: infinite pseudo R-squared."
    if math.isnan(prsq_val):
        prsq_val = None

    output.append(['pseudo_r_squared', prsq_val, None, None, None, None, None])

    return output

Online Calculator